The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the missForest, VIM, mice or missMDA packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forests, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.

The report contains, all the results, grouped by both: package and dataset.

Basic (median/mode)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.1
## Test set imputation time:  0.03

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.781
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.004
## Test set imputation time:  0.003

Test set results

## Test set AUC:  0.951
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.006
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.575
## Test set BACC:  0.591
## Test set MCC:  0.215

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.93
## Test set BACC:  0.874
## Test set MCC:  0.752

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.03
## Test set imputation time:  0.014

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.087
## Test set imputation time:  0.042

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

Mice

adult

Crossvalidation results

Imputation times

## Train set imputation time:  5.263
## Test set imputation time:  0.708

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.776
## Test set MCC:  0.598

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.093
## Test set imputation time:  0.06

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.903
## Test set MCC:  0.81

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.341
## Test set imputation time:  0.172

Test set results

## Test set AUC:  0.494
## Test set BACC:  0.492
## Test set MCC:  -0.017

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.077
## Test set imputation time:  0.073

Test set results

## Test set AUC:  0.905
## Test set BACC:  0.85
## Test set MCC:  0.697

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.251
## Test set imputation time:  0.094

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.907
## Test set MCC:  0.874

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  170.446
## Test set imputation time:  48.945

Test set results

## Test set AUC:  1
## Test set BACC:  0.969
## Test set MCC:  0.963

Missings overview

K-Nearest Neighbors (VIM)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  113.908
## Test set imputation time:  7.141

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.775
## Test set MCC:  0.595

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.308
## Test set imputation time:  0.101

Test set results

## Test set AUC:  0.966
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.512
## Test set imputation time:  0.12

Test set results

## Test set AUC:  0.589
## Test set BACC:  0.607
## Test set MCC:  0.234

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.12
## Test set imputation time:  0.059

Test set results

## Test set AUC:  0.923
## Test set BACC:  0.83
## Test set MCC:  0.667

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  6.126
## Test set imputation time:  0.604

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  267.663
## Test set imputation time:  19.991

Test set results

## Test set AUC:  1
## Test set BACC:  0.978
## Test set MCC:  0.974

Missings overview

Hot Deck (VIM)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.079
## Test set imputation time:  0.034

Test set results

## Test set AUC:  0.914
## Test set BACC:  0.779
## Test set MCC:  0.605

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.03
## Test set imputation time:  0.026

Test set results

## Test set AUC:  0.965
## Test set BACC:  0.901
## Test set MCC:  0.811

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.041
## Test set imputation time:  0.027

Test set results

## Test set AUC:  0.594
## Test set BACC:  0.594
## Test set MCC:  0.207

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.027
## Test set imputation time:  0.024

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.868
## Test set MCC:  0.733

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.055
## Test set imputation time:  0.048

Test set results

## Test set AUC:  0.995
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.554
## Test set imputation time:  0.301

Test set results

## Test set AUC:  1
## Test set BACC:  0.964
## Test set MCC:  0.956

Missings overview

MissRanger

adult

Crossvalidation results

Imputation times

## Train set imputation time:  21.599
## Test set imputation time:  3.953

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.78
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.569
## Test set imputation time:  0.188

Test set results

## Test set AUC:  0.959
## Test set BACC:  0.883
## Test set MCC:  0.769

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.473
## Test set imputation time:  0.136

Test set results

## Test set AUC:  0.587
## Test set BACC:  0.57
## Test set MCC:  0.149

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.503
## Test set imputation time:  0.184

Test set results

## Test set AUC:  0.919
## Test set BACC:  0.846
## Test set MCC:  0.695

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  1.674
## Test set imputation time:  0.48

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  83.831
## Test set imputation time:  18.205

Test set results

## Test set AUC:  1
## Test set BACC:  0.977
## Test set MCC:  0.972

Missings overview